The search functionality is under construction.

Author Search Result

[Author] Michitaka KAMEYAMA(65hit)

41-60hit(65hit)

  • Code Assignment Algorithm for Highly Parallel Multiple-Valued Combinational Circuits Based on Partition Theory

    Saneaki TAMAKI  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Logic Design

      Vol:
    E76-D No:5
      Page(s):
    548-554

    Design of locally computable combinational circuits is a very important subject to implement high-speed compact arithmetic and logic circuits in VLSI systems. This paper describes a multiple-valued code assignment algorithm for the locally computable combinational circuits, when a functional specification for a unary operation is given by the mapping relationship between input and output symbols. Partition theory usually used in the design of sequential circuits is effectively employed for the fast search for the code assignment problem. Based on the partition theory, mathematical foundation is derived for the locally computable circuit design. Moreover, for permutation operations, we propose an efficient code assignment algorithm based on closed chain sets to reduce the number of combinations in search procedure. Some examples are shown to demonstrate the usefulness of the algorithm.

  • A Bit-Serial Reconfigurable VLSI Based on a Multiple-Valued X-Net Data Transfer Scheme

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Computer System

      Vol:
    E96-D No:7
      Page(s):
    1449-1456

    A multiple-valued data transfer scheme using X-net is proposed to realize a compact bit-serial reconfigurable VLSI (BS-RVLSI). In the multiple-valued data transfer scheme using X-net, two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. One cell composed of a logic block and a switch block is connected to four adjacent cross points by four one-bit switches so that the complexity of the switch block is reduced to 50% in comparison with the cell of a BS-RVLSI using an eight nearest-neighbor mesh network (8-NNM). In the logic block, threshold logic circuits are used to perform threshold operations, and then their binary dual-rail voltage outputs enter a binary logic module which can be programmed to realize an arbitrary two-variable binary function or a bit-serial adder. As a result, the configuration memory count and transistor count of the proposed multiple-valued cell are reduced to 34% and 58%, respectively, in comparison with those of an equivalent CMOS cell. Moreover, its power consumption for an arbitrary 2-variable binary function becomes 67% at 800 MHz under the condition of the same delay time.

  • Architecture of a Stereo Matching VLSI Processor Based on Hierarchically Parallel Memory Access

    Masanori HARIYAMA  Haruka SASAKI  Michitaka KAMEYAMA  

     
    PAPER-Digital Circuits and Computer Arithmetic

      Vol:
    E88-D No:7
      Page(s):
    1486-1491

    This paper presents a VLSI processor for high-speed and reliable stereo matching based on adaptive window-size control of SAD(Sum of Absolute Differences) computation. To reduce its computational complexity, SADs are computed using multi-resolution images. Parallel memory access is essential for highly parallel image processing. For parallel memory access, this paper also presents an optimal memory allocation that minimizes the hardware amount under the condition of parallel memory access at specified resolutions.

  • Memory Allocation for Multi-Resolution Image Processing

    Yasuhiro KOBAYASHI  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-VLSI Systems

      Vol:
    E91-D No:10
      Page(s):
    2386-2397

    Hierarchical approaches using multi-resolution images are well-known techniques to reduce the computational amount without degrading quality. One major issue in designing image processors is to design a memory system that supports parallel access with a simple interconnection network. The complexity of the interconnection network mainly depends on memory allocation; it maps pixels onto memory modules and determines the required number of memory modules. This paper presents a memory allocation method to minimize the number of memory modules for image processing using multi-resolution images. For efficient search, the proposed method exploits the regularity of window-type image processing. A practical example demonstrates that the number of memory modules is reduced to less than 14% that of conventional methods.

  • A Collision Detection Processor for Intelligent Vehicles

    Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E76-C No:12
      Page(s):
    1804-1811

    Since carelessness in driving causes a terrible traffic accident, it is an important subject for a vehicle to avoid collision autonomously. Real-time collision detection between a vehicle and obstacles will be a key target for the next-generation car electronics system. In collision detection, a large storage capacity is usually required to store the 3-D information on the obstacles lacated in a workspace. Moreover, high-computational power is essential not only in coordinate transformation but also in matching operation. In the proposed collision detection VLSI processor, the matching operation is drastically accelerated by using a Content-Addressable Memory (CAM) which evaluates the magnitude relationships between an input word and all the stored words in parallel. A new obstacle representation based on a union of rectangular solids is also used to reduce the obstacle memory capacity, so that the collision detection can be parformed only by parallel magnitude comparison. Parallel architecture using several identical processor elements (PEs) is employed to perform the coordinate transformation at high speed based on the COordinate Rotation DIgital Computation (CORDIC) algorithms. The collision detection time becomes 5.2 ms using 20 PEs and five CAMs with a 42-kbit capacity.

  • Multiple-Valued Logic-in-Memory VLSI Architecture Based on Floating-Gate-MOS Pass-Transistor Logic

    Takahiro HANYU  Michitaka KAMEYAMA  

     
    PAPER-Non-Binary Architectures

      Vol:
    E82-C No:9
      Page(s):
    1662-1668

    A new logic-in-memory VLSI architecture based on multiple-valued floating-gate-MOS pass-transistor logic is proposed to solve the communication bottleneck between memory and logic modules. Multiple-valued stored data are represented by the threshold voltage of a floating-gate MOS transistor, so that a single floating-gate MOS transistor is effectively employed to merge multiple-valued threshold-literal and pass-switch functions. As an application, a four-valued logic-in-memory VLSI for high-speed pattern recognition is also presented. The proposed VLSI detects a stored reference word with the minimum Manhattan distance between a 16-bit input word and 16-bit stored reference words. The effective chip area, the switching delay and the power dissipation of a new four-valued full adder, which is a key component of the proposed logic-in-memory VLSI, are reduced to about 33 percent, 67 percent and 24 percent, respectively, in comparison with those of the corresponding binary CMOS implementation under a 0.5-µm flash EEPROM technology.

  • A Multiple-Valued Reconfigurable VLSI Architecture Using Binary-Controlled Differential-Pair Circuits

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E96-C No:8
      Page(s):
    1083-1093

    This paper presents a fine-grain bit-serial reconfigurable VLSI architecture using multiple-valued switch blocks and binary logic modules. Multiple-valued signaling is utilized to implement a compact switch block. A binary-controlled current-steering technique is introduced, utilizing a programmable three-level differential-pair circuit to implement a high-performance low-power arbitrary two-variable binary function, and increase the noise margins in comparison with the quaternary-controlled differential-pair circuit. A current-source sharing technique between a series-gating differential-pair circuit and a current-mode D-latch is proposed to reduce the current source count and improve the speed. It is demonstrated that the power consumption and the delay of the proposed multiple-valued cell based on the binary-controlled current-steering technique and the current-source-sharing technique are reduced to 63% and 72%, respectively, in comparison with those of a previous multiple-valued cell.

  • An FPGA-Oriented Motion-Stereo Processor with a Simple Interconnection Network for Parallel Memory Access

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Image Processing, Image Pattern Recognition

      Vol:
    E83-D No:12
      Page(s):
    2122-2130

    In designing a field-programmable gate array (FPGA)-based processor for motion stereo, a parallel memory system and a simple interconnection network for parallel data transfer are essential for parallel image processing. This paper, firstly, presents an FPGA-oriented hierarchical memory system. To reduce the bandwidth requirement between an on-chip memory in an FPGA and external memories, we propose an efficient scheduling: Once pixels are transferred to the on-chip memory, operations associated with the data are consecutively performed. Secondly, a rectangular memory allocation is proposed which allocates pixels to be accessed in parallel onto different memory modules of the on-chip memory. Consequently, completely parallel access can be achieved. The memory allocation also minimizes the required capacity of the on-chip memory and thus is suitable for FPGA-based implementation. Finally, a functional unit allocation is proposed to minimize the complexity between memory modules and functional units. An experimental result shows that the performance of the processor becomes 96 times higher than that of a 400 MHz Pentium II.

  • Implementation of a Low-Power FPGA Based on Synchronous/Asynchronous Hybrid Architecture

    Shota ISHIHARA  Ryoto TSUCHIYA  Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E94-C No:10
      Page(s):
    1669-1679

    This paper presents a low-power FPGA based on mixed synchronous/asynchronous design. The proposed FPGA consists of several sections which consist of logic blocks, and each section can be used as either a synchronous circuit or an asynchronous circuit according to its workload. An asynchronous circuit is power-efficient for a low-workload section since it does not require the clock tree which always consumes the power. On the other hand, a synchronous circuit is power-efficient for a high-workload section because of its simple hardware. The major consideration is designing an area-efficient synchronous/asynchronous hybrid logic block. This is because the hardware amount of the asynchronous circuit is about double that of the synchronous circuit, and the typical implementation wastes half of the hardware in synchronous mode. To solve this problem, we propose a hybrid logic block that can be used as either a single asynchronous logic block or two synchronous logic blocks. The proposed FPGA is fabricated using a 65-nm CMOS process. When the workload of a section is below 22%, asynchronous mode is more power-efficient than synchronous mode. Otherwise synchronous mode is more power-efficient.

  • A VLSI-Oriented Digital Signal Processor Based on Pulse-Train Residue Arithmetic Circuit with a Multiplier

    Michitaka KAMEYAMA  Oluwole ADEGBENRO  Tatsuo HIGUCHI  

     
    PAPER-Signal Processing

      Vol:
    E68-E No:1
      Page(s):
    14-21

    This paper proposes a new residue number multiplication scheme based on the cylic type of relationship which exists between the entries in the residue number multiplication truth-table when the modulus is any prime number. Using the scheme, multiplication is direct without table consultation and an entire truth-table is realizable. The multiplier circuit is simple and compact and allows pipelined processing of data. The flexibility of the multiplier is exploited in the implementation of an RNS based high-order FIR digital filter by using a programmable low order section. The suitability of the modular digital processor for VLSI is also indicated.

  • Multiple-Valued Fine-Grain Reconfigurable VLSI Using a Global Tree Local X-Net Network

    Xu BAI  Michitaka KAMEYAMA  

     
    PAPER-VLSI Architecture

      Vol:
    E97-D No:9
      Page(s):
    2278-2285

    A global tree local X-net network (GTLX) is introduced to realize high-performance data transfer in a multiple-valued fine-grain reconfigurable VLSI (MVFG-RVLSI). A global pipelined tree network is utilized to realize high-performance long-distance bit-parallel data transfer. Moreover, a logic-in-memory architecture is employed for solving data transfer bottleneck between a block data memory and a cell. A local X-net network is utilized to realize simple interconnections and compact switch blocks for eight-near neighborhood data transfer. Moreover, multiple-valued signaling is utilized to improve the utilization of the X-net network, where two binary data can be transferred from two adjacent cells to one common adjacent cell simultaneously at each “X” intersection. To evaluate the MVFG-RVLSI, a fast Fourier transform (FFT) operation is mapped onto a previous MVFG-RVLSI using only the X-net network and the MVFG-RVLSI using the GTLX. As a result, the computation time, the power consumption and the transistor count of the MVFG-RVLSI using the GTLX are reduced by 25%, 36% and 56%, respectively, in comparison with those of the MVFG-RVLSI using only the X-net network.

  • Design and Implementation of a Low-Power Multiple-Valued Current-Mode Integrated Circuit with Current-Source Control

    Takahiro HANYU  Satoshi KAZAMA  Michitaka KAMEYAMA  

     
    PAPER-Multiple-Valued Architectures

      Vol:
    E80-C No:7
      Page(s):
    941-947

    A new multiple-valued current-mode (MVCM) integrated circuit using a switched current-source control technique is proposed for a 1.5 V-supply high-speed arithmetic circuit with low-power dissipation. The use of a differential logic circuit (DLC) with a pair of dual-rail inputs makes the input voltage swing small, which results in a high driving capability at a lower supply voltage, while having large static power dissipation. In the proposed DLC using a switched current control technique, the static power dissipation can be greatly reduced because current sources in non-active circuit blocks are turned off. Since the gate of each current source is directly controlled by using a multiphase clock whose technique has been already used in dynamic circuit design, no additional transistors are required for currentsource control. As a typical example of arithmetic circuits, a new 1.5 V-supply 5454-bit multiplier based on a 0.8µm standard CMOS technology is also designed. Its performance is about 1.3 times faster than that of a binary fastest multiplier under the normalized power dissipation. A prototype chip is also fabricated to confirm the basic operation of the proposed MVCM integrated circuit.

  • A Three-Dimensional Instrumentation VLSI Processor Based on a Concurrent Memory-Access Scheme

    Seunghwan LEE  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Integrated Electronics

      Vol:
    E80-C No:11
      Page(s):
    1491-1498

    Three-dimensional (3-D) instrumentation using an image sequence is a promising instrumentation method for intelligent systems in which accurate 3-D information is required. However, real-time instrumentation is difficult since much computation time and a large memory bandwidth are required. In this paper, a 3-D instrumentation VLSI processor with a concurrent memory-access scheme is proposed. To reduce the access time, frequently used data are stored in a cache register array and are concurrently transferred to processing elements using simple interconnections to the 8-nearest neighbor registers. Based on a row and column memory access pattern, we propose a diagonally interleaved frame memory by which pixel values of a row and column are stored across memory modules. Based on the concurrent memory-access scheme, a 40 GOPS vprocessor is designed and the delay time for the instrumentation is estimated to be 42 ms for a 256256 images.

  • Parallel VLSI Processors for Robotics Using Multiple Bus Interconnection Networks

    Bumchul KIM  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Robot Electronics

      Vol:
    E75-A No:6
      Page(s):
    712-719

    This paper proposes parallel VLSI processors for robotics based on multiple processing elements organized around multiple bus interconnection networks. The advantages of multiple bus interconnection networks are generality, simplicity of implementation and capability of parallel communications between processing elements, therefore it is considered to be suitable for parallel VLSI systems. We also propose the optimal scheduling formulated in an integer programming problem to minimize the delay time of the parallel VLSI processors.

  • FOREWORD

    Michitaka KAMEYAMA  

     
    FOREWORD

      Vol:
    E90-C No:10
      Page(s):
    1849-1849
  • Field-Programmable VLSI Based on a Bit-Serial Fine-Grain Architecture

    Masanori HARIYAMA  Weisheng CHONG  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E87-C No:11
      Page(s):
    1897-1902

    This paper presents a novel architecture to solve two problems of existing FPGAs : the large delay and area due to complex programmable switch blocks, and the large area due to coarse-grain logic blocks that are underutilized to a great degree. A mesh-connected cellular array based on a bit-serial pipeline architecture is introduced to minimize complexity of switch blocks. A fine-grain logic block architecture with a functionality of a bit-serial adder is presented to minimize the number of inputs and outputs of the logic block since increase in the number of inputs and outputs directly increases the complexity of a switch block. For an area-efficient design, the logic block is implemented based on a hybrid of a programmable logic gate and a dedicated carry logic. The hybrid architecture allows us to use a small lookup table to implement the logic gate. Moreover, the carry logic uses a functional pass-gate that merges both logic and storage functions compactly. The performance of the fine-grain field-programmable VLSI (FPVLSI) is evaluated to be more than 2 times higher than that of a coarse-grain FPVLSI.

  • FOREWORD Open Access

    Michitaka KAMEYAMA  

     
    FOREWORD

      Vol:
    E93-D No:8
      Page(s):
    2025-2025
  • An Asynchronous FPGA Based on LEDR/4-Phase-Dual-Rail Hybrid Architecture

    Shota ISHIHARA  Yoshiya KOMATSU  Masanori HARIYAMA  Michitaka KAMEYAMA  

     
    PAPER-Electronic Circuits

      Vol:
    E93-C No:8
      Page(s):
    1338-1348

    This paper presents an asynchronous FPGA that combines 4-phase dual-rail encoding and LEDR (Level-Encoded Dual-Rail) encoding. 4-phase dual-rail encoding is employed to achieve small area and low power for function units, while LEDR encoding is employed to achieve high throughput and low power for the data transfer using programmable interconnection resources. Area-efficient protocol converters and their control circuits are also proposed in transistor-level implementation. The proposed FPGA is designed using the e-Shuttle 65nm CMOS process. Compared to the 4-phase-dual-rail-based FPGA, the throughput is increased by 69% with almost the same transistor count. Compared to the LEDR-based FPGA, the transistor count is reduced by 47% with almost the same throughput. In terms of power consumption, the proposed FPGA achieves the lowest power compared to the 4-phase-dual-rail-based and the LEDR-based FPGAs. Compared to the synchronous FPGA, the proposed FPGA has lower power consumption when the workload is below 35%.

  • A VLSI-Oriented Model-Based Robot Vision Processor for 3-D Instrumentation and Object Recognition

    Yoshifumi SASAKI  Michitaka KAMEYAMA  

     
    PAPER

      Vol:
    E77-C No:7
      Page(s):
    1116-1122

    In robot vision system, enormously large computation power is required to perform three-dimensional (3-D) instrumentation and object recognition. However, many kinds of complex and irregular operations are required to make accurate 3-D instrumentation and object recognition in the conventional method for software implementation. In this paper, a VLSI-oriented Model-Based Robot Vision (MBRV) processor is proposed for high-speed and accurate 3-D instrumentation and object recognition. An input image is compared with two-dimensional (2-D) silhouette images which are generated from the 3-D object models by means of perspective projection. Because the MBRV algorithm always gives the candidates for the accurate 3-D instrumentation and object recognition result with simple and regular procedures, it is suitable for the implementation of the VLSI processor. Highly parallel architecture is employed in the VLSI processor to reduce the latency between the image acquisition and the output generation of the 3-D instrumentation and object recognition results. As a result, 3-D instrumentation and object recognition can be performed 10000 times faster than a 28.5 MIPS workstation.

  • Unified Scheduling of High Performance Parallel VLSI Processors for Robotics

    Bumchul KIM  Michitaka KAMEYAMA  Tatsuo HIGUCHI  

     
    PAPER-Parallel Processor Scheduling

      Vol:
    E76-A No:6
      Page(s):
    904-910

    The performance of processing elements can be improved by the progress of VLSI circuit technology, while the communication overhead can not be negligible in parallel processing system. This paper presents a unified scheduling that allocates tasks having different task processing times in multiple processing elements. The objective function is formulated to measure communication time between processing elements. By employing constraint conditions, the scheduling efficiently generates an optimal solution using an integer programming so that minimum communication time can be achieved. We also propose a VLSI processor for robotics whose latency is very small. In the VLSI processor, the data transfer between two processing elements can be done very quickly, so that the communication cycle time is greatly reduced.

41-60hit(65hit)